64 research outputs found
The Lie algebra cohomology of jets
Let g be a finite-dimensional complex semi simple Lie algebra. We present a
new calculation of the continuous cohomology of the Lie algebra z g[[z]]. In
particular, we shall give an explicit formula for the Laplacian on the Lie
algebra cochains, from which we can deduce that the cohomology in each
dimension is a finite-dimensional representation of g which contains any
irreducible representation of g at most once
Implicit reference to citations: a study of astronomy
The research in this paper presents results in the automatic classification of pronouns within articles into those which refer to cited research and those which do not. It also discusses the automatic linking of pronouns which do refer to citations to their corresponding citations. The current study focused on the pronoun they as used in papers in Astronomy journals. The paper describes a classifier trained on maximum entropy principles using features defined by the distance to preceding citations and the category of verbs associated to the pronoun under consideration
Detecting Family Resemblance: Automated Genre Classification.
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
Automating Metadata Extraction: Genre Classification
A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF([22]). Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.
Examining Variations of Prominent Features in Genre Classification.
This paper investigates the correlation between features of three types (visual, stylistic and topical types) and genre classes. The majority of previous studies in automated genre classification have created models based on an amalgamated representation of a document using a combination of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. In this paper we use classifiers independently modeled on three groups of features to examine six genre classes to show that the strongest features for making one classification is not necessarily the best features for carrying out another classification.
Metadata and Other Stories Online: Is Metadata a Love Letter to the Future?
No abstract available
Data, Information, and Knowledge: "where is the Life we have lost in living?"
This abstract attempts to raise the question of whether current practices in digital preservation properly address the issues of findability of digital objects. It is also intended as a starting point for discussing preservation of digital information in contrast to digital data. The abstract is exploratory and informal
Feature Type Analysis in Automated Genre Classification
In this paper, we compare classifiers based on language model, image, and stylistic features for automated genre classification. The majority of previous studies in genre classification have created models based on an amalgamated representation of a document using a multitude of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. By independently modeling and comparing classifiers based on features belonging to three types, describing visual, stylistic, and topical properties, we demonstrate that different genres have distinctive feature strengths.
Building a Document Genre Corpus: a Profile of the KRYS I Corpus
This paper describes the KRYS I corpus (http://www.krys-corpus.eu/Info.html), consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains
Formulating representative features with respect to document genre classification
Genre classification (e.g. whether a document
is a scientific article or magazine article) is closely
bound to the physical and conceptual structure of document
as well as the level of depth involved in the text.
Hence, it provides a means of ranking documents retrieved
by search tools according to metrics other than
topical similarity. Moreover, the structural information
derived from genre classification can be used to locate
target information within the text. In previous studies,
the detection of genre classes has been attempted
by using some normalised frequency of terms or combinations
of terms in the document (here, we are using
term as a reference to words, phrases, syntactic
units, sentences and paragraphs, as well as other patterns
derived from deeper linguistic or semantic analysis).
These approaches largely neglect how the term is
distributed throughout the document. Here, we report
the results of automated experiments based on distributive
statistics of words in order to present evidence that
term distribution pattern is a better indicator of genre
class than term frequency.
- …